Requirement Mining in Technical Documents
نویسندگان
چکیده
In this paper, we first develop the linguistic characteristics of requirements which are specific forms of arguments. The discourse structures that refine or elaborate requirements are also analyzed. These specific discourse relations are conceptually characterized, with the functions they play. An implementation is carried out in Dislog on the platform. Dislog allows high level specifications in logic for a fast and easy prototyping at a high level of linguistic adequacy. 1 The Structure of Requirement Compounds Arguments and in partticular requirements in written texts or dialogues seldom come in isolation, as independent statements. They are often embedded into a context that indicates e.g. circumstances, elaborations or purposes. Relations between a requirement and its context may be conceptually complex. They often appear in small and closely related groups or clusters that often share similar aims, where the first one is complemented, supported, reformulated, contrasted or elaborated by the subsequent ones and by additional statements. The typical configuration of a requirement compound can be summarized as follows: CIRCUMSTANCE(S)/CONDITION(S),PURPOSE(S)--> [REQUIREMENT CONCLUSION + SUPPORT(S)]* <-PURPOSE(S), , ELABORATION(S) CONCESSION(S) / CONTRAST(S) In terms of language realization, clusters of requirements and their related context may be all included into a single sentence via coordination or subordination or may appear as separate sentences. In both cases, the relations between the different elements of a cluster are realized by means of conjunctions, connectors, various forms of references and punctuation. We call such a cluster an requirement compound. The idea behind this term is that the elements in a compound form a single, possibly complex, unit, which must be considered as a whole from a conceptual and argumentative point of view. Such a compound consists of a small number of sentences, so that its contents can be easily assimilated. 2 Linguistic Analysis 2.1 Corpus characteristics Our corpus of requirements comes from 3 organizations and 6 companies. Our corpus contains 1,138 pages of text extracted from 22 documents. The main features considered to validate our corpus are the following: specifications come form various industrial areas; documents are produced by various actors; requirement documents follow various authoring guidelines; requirements correspond to different conceptual levels. A typical simple example is the following: Inventory of qualifications refers to norm YY. < /definition> Periodically, an inventory of supplier’s qualifications shall be produced. < /mainReq> In addition, the supplier’s quality department shall periodically conduct a monitoring audit program.< /secondaryReq> At any time, the supplier should be able to provide evidences that EC qualification is maintained. < /ReqCompound>
منابع مشابه
Corporate Decision Making with Self-Organizing Patent Maps Labeled by Technical Terms and AHP
In this paper, we propose an approach for corporate decision making with self-organizing patent maps labeled by technical terms and AHP. First, we select the patent area of interest and collect pertinent patent documents in text format. Second, we extract keywords by text mining to transform patent documents into feature vectors of the companies. Third, we input the feature matrix of technical ...
متن کامل3D spatial data mining on document sets for the discovery of failure causes in complex technical devices
The retrospective fault analysis of complex technical devices based on documents emerging in the advanced steps of the product life cycle can reveal error sources and problems, which have not been discovered by simulations or other test methods in the early stages of the product life cycle. This paper presents a novel approach to support the failure analysis through (i) a semi-automatic analysi...
متن کاملNorwegian University of Science and Technology Technical Report IDI-TR-09/2007 Semantic-Based Association Rule Mining of Temporal Document Collections
In many contexts today we have documents available in a number of versions. In addition to explicit knowledge that can be queried/searched in documents, these documents also contain implicit knowledge that can be found by text mining. In this paper we will study association rule mining of temporal document collections, and extend our previous work by 1) performing mining based on semantics as w...
متن کاملA review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملMining Technique Using Association Rules Extraction
automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions...
متن کاملUse of Quality Function Deployment for Gold Mining Feasibility Study (Case Study: Designing Explosive Storage Area)
In the mining industry, at the beginning of development of a project, a consultant is assigned to build a design feasibility study to incorporate the client requirement, government regulation, and other inputs into the design. The consultant usually faces overwhelmed stages due to changes caused by the client and other stakeholders and has to repeat the process of inputting requirements into th...
متن کامل